bias type
LearningDebiasedRepresentationvia DisentangledFeatureAugmentation
Thesebiased models suffer from the poor generalization capability when evaluated on unbiased datasets. Existing approaches for debiasing often identify and emphasize those samples withnosuchcorrelation (i.e.,bias-conflicting)without defining the bias type in advance. However, such bias-conflicting samples are significantly scarce in biased datasets, limiting the debiasing capability of these approaches.
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > China (0.04)
Echoes of AI Harms: A Human-LLM Synergistic Framework for Bias-Driven Harm Anticipation
Tantalaki, Nicoleta, Vei, Sophia, Vakali, Athena
The growing influence of Artificial Intelligence (AI) systems on decision-making in critical domains has exposed their potential to cause significant harms, often rooted in biases embedded across the AI lifecycle. While existing frameworks and taxonomies document bias or harms in isolation, they rarely establish systematic links between specific bias types and the harms they cause, particularly within real-world sociotechnical contexts. Technical fixes proposed to address AI biases are ill-equipped to address them and are typically applied after a system has been developed or deployed, offering limited preventive value. We propose ECHO, a novel framework for proactive AI harm anticipation through the systematic mapping of AI bias types to harm outcomes across diverse stakeholder and domain contexts. ECHO follows a modular workflow encompassing stakeholder identification, vignette-based presentation of biased AI systems, and dual (human-LLM) harm annotation, integrated within ethical matrices for structured interpretation. This human-centered approach enables early-stage detection of bias-to-harm pathways, guiding AI design and governance decisions from the outset. We validate ECHO in two high-stakes domains (disease diagnosis and hiring), revealing domain-specific, bias-to-harm patterns and demonstrating ECHO's potential to support anticipatory governance of AI systems
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Greece > Central Macedonia > Thessaloniki (0.04)
- Europe > Switzerland (0.04)
- (22 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (0.93)
- Health & Medicine > Therapeutic Area (0.92)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector
Yang, Haoyan, Bao, Runxue, Xiao, Cao, Ma, Jun, Bhatia, Parminder, Gao, Shangqian, Kass-Hout, Taha
LLM-as-a-Judge has emerged as a promising tool for automatically evaluating generated outputs, but its reliability is often undermined by potential biases in judgment. Existing efforts to mitigate these biases face key limitations: in-context learning-based methods fail to address rooted biases due to the evaluator's limited capacity for self-reflection, whereas fine-tuning is not applicable to all evaluator types, especially closed-source models. To address this challenge, we introduce the Reasoning-based Bias Detector (RBD), which is a plug-in module that identifies biased evaluations and generates structured reasoning to guide evaluator self-correction. Rather than modifying the evaluator itself, RBD operates externally and engages in an iterative process of bias detection and feedback-driven revision. To support its development, we design a complete pipeline consisting of biased dataset construction, supervision collection, distilled reasoning-based fine-tuning of RBD, and integration with LLM evaluators. We fine-tune four sizes of RBD models, ranging from 1.5B to 14B, and observe consistent performance improvements across all scales. Experimental results on 4 bias types--verbosity, position, bandwagon, and sentiment--evaluated using 8 LLM evaluators demonstrate RBD's strong effectiveness. For example, the RBD-8B model improves evaluation accuracy by an average of 18.5% and consistency by 10.9%, and surpasses prompting-based baselines and fine-tuned judges by 12.8% and 17.2%, respectively. These results highlight RBD's effectiveness and scalability. Additional experiments further demonstrate its strong generalization across biases and domains, as well as its efficiency.
- Europe > France (0.05)
- Europe > Spain > Galicia > Madrid (0.04)
- North America > United States > Tennessee > Sullivan County > Kingsport (0.04)
- (3 more...)
BTC-SAM: Leveraging LLMs for Generation of Bias Test Cases for Sentiment Analysis Models
Kardkovacs, Zsolt T., Djennane, Lynda, Field, Anna, Benatallah, Boualem, Gaci, Yacine, Casati, Fabio, Gaaloul, Walid
Sentiment Analysis (SA) models harbor inherent social biases that can be harmful in real-world applications. These biases are identified by examining the output of SA models for sentences that only vary in the identity groups of the subjects. Constructing natural, linguistically rich, relevant, and diverse sets of sentences that provide sufficient coverage over the domain is expensive, especially when addressing a wide range of biases: it requires domain experts and/or crowd-sourcing. In this paper, we present a novel bias testing framework, BTC-SAM, which generates high-quality test cases for bias testing in SA models with minimal specification using Large Language Models (LLMs) for the controllable generation of test sentences. Our experiments show that relying on LLMs can provide high linguistic variation and diversity in the test sentences, thereby offering better test coverage compared to base prompting methods even for previously unseen biases.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (11 more...)
- Education (1.00)
- Health & Medicine (0.68)
IndiCASA: A Dataset and Bias Evaluation Framework in LLMs Using Contrastive Embedding Similarity in the Indian Context
S, Santhosh G, S, Akshay Govind, Krishnan, Gokul S, Ravindran, Balaraman, Natarajan, Sriraam
Large Language Models (LLMs) have gained significant traction across critical domains owing to their impressive contextual understanding and generative capabilities. However, their increasing deployment in high stakes applications necessitates rigorous evaluation of embedded biases, particularly in culturally diverse contexts like India where existing embedding-based bias assessment methods often fall short in capturing nuanced stereotypes. We propose an evaluation framework based on a encoder trained using contrastive learning that captures fine-grained bias through embedding similarity. We also introduce a novel dataset - IndiCASA (IndiBias-based Contextually Aligned Stereotypes and Anti-stereotypes) comprising 2,575 human-validated sentences spanning five demographic axes: caste, gender, religion, disability, and socioeconomic status. Our evaluation of multiple open-weight LLMs reveals that all models exhibit some degree of stereotypical bias, with disability related biases being notably persistent, and religion bias generally lower likely due to global debiasing efforts demonstrating the need for fairer model development.
- Asia > India (0.25)
- North America > United States > Texas (0.04)
- North America > United States > Ohio (0.04)
- (4 more...)
- Health & Medicine (1.00)
- Education (0.68)
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
Xu, Xin, He, Xunzhi, Zhi, Churan, Chen, Ruizhe, McAuley, Julian, He, Zexue
Existing studies on bias mitigation methods for large language models (LLMs) use diverse baselines and metrics to evaluate debiasing performance, leading to inconsistent comparisons among them. Moreover, their evaluations are mostly based on the comparison between LLMs' probabilities of biased and unbiased contexts, which ignores the gap between such evaluations and real-world use cases where users interact with LLMs by reading model responses and expect fair and safe outputs rather than LLMs' probabilities. To enable consistent evaluation across debiasing methods and bridge this gap, we introduce BiasFreeBench, an empirical benchmark that comprehensively compares eight mainstream bias mitigation techniques (covering four prompting-based and four training-based methods) on two test scenarios (multi-choice QA and open-ended multi-turn QA) by reorganizing existing datasets into a unified query-response setting. We further introduce a response-level metric, Bias-Free Score, to measure the extent to which LLM responses are fair, safe, and anti-stereotypical. Debiasing performances are systematically compared and analyzed across key dimensions: the prompting vs. training paradigm, model size, and generalization of different training strategies to unseen bias types. We will publicly release our benchmark, aiming to establish a unified testbed for bias mitigation research.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- Europe > Austria > Vienna (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.05)
- (19 more...)
Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
Wang, Yifan, Jobanputra, Mayank, Lee, Ji-Ung, Oh, Soyoung, Valera, Isabel, Demberg, Vera
Natural language processing (NLP) models often replicate or amplify social bias from training data, raising concerns about fairness. At the same time, their black-box nature makes it difficult for users to recognize biased predictions and for developers to effectively mitigate them. While some studies suggest that input-based explanations can help detect and mitigate bias, others question their reliability in ensuring fairness. Existing research on explainability in fair NLP has been predominantly qualitative, with limited large-scale quantitative analysis. In this work, we conduct the first systematic study of the relationship between explainability and fairness in hate speech detection, focusing on both encoder- and decoder-only models. We examine three key dimensions: (1) identifying biased predictions, (2) selecting fair models, and (3) mitigating bias during model training. Our findings show that input-based explanations can effectively detect biased predictions and serve as useful supervision for reducing bias during training, but they are unreliable for selecting fair models among candidates.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- (15 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models
Yuan, Botai, Zhou, Yutian, Wang, Yingjie, Huo, Fushuo, Jing, Yongcheng, Shen, Li, Wei, Ying, Shen, Zhiqi, Liu, Ziwei, Zhang, Tianwei, Yang, Jie, Tao, Dacheng
Recent benchmarks for medical Large Vision-Language Models (LVLMs) emphasize leaderboard accuracy, overlooking reliability and safety. We study sycophancy -- models' tendency to uncritically echo user-provided information -- in high-stakes clinical settings. We introduce EchoBench, a benchmark to systematically evaluate sycophancy in medical LVLMs. It contains 2,122 images across 18 departments and 20 modalities with 90 prompts that simulate biased inputs from patients, medical students, and physicians. We evaluate medical-specific, open-source, and proprietary LVLMs. All exhibit substantial sycophancy; the best proprietary model (Claude 3.7 Sonnet) still shows 45.98% sycophancy, and GPT-4.1 reaches 59.15%. Many medical-specific models exceed 95% sycophancy despite only moderate accuracy. Fine-grained analyses by bias type, department, perceptual granularity, and modality identify factors that increase susceptibility. We further show that higher data quality/diversity and stronger domain knowledge reduce sycophancy without harming unbiased accuracy. EchoBench also serves as a testbed for mitigation: simple prompt-level interventions (negative prompting, one-shot, few-shot) produce consistent reductions and motivate training- and decoding-time strategies. Our findings highlight the need for robust evaluation beyond accuracy and provide actionable guidance toward safer, more trustworthy medical LVLMs.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Consumer Health (0.67)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > South Korea (0.04)
- Asia > China > Hong Kong (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Vision (0.94)
- Information Technology > Artificial Intelligence > Natural Language (0.69)
HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America
Ivetta, Guido, Gomez, Marcos J., Martinelli, Sofía, Palombini, Pietro, Echeveste, M. Emilia, Mazzeo, Nair Carolina, Busaniche, Beatriz, Benotti, Luciana
Most resources for evaluating social biases in Large Language Models are developed without co-design from the communities affected by these biases, and rarely involve participatory approaches. We introduce HESEIA, a dataset of 46,499 sentences created in a professional development course. The course involved 370 high-school teachers and 5,370 students from 189 Latin-American schools. Unlike existing benchmarks, HESEIA captures intersectional biases across multiple demographic axes and school subjects. It reflects local contexts through the lived experience and pedagogical expertise of educators. Teachers used minimal pairs to create sentences that express stereotypes relevant to their school subjects and communities. We show the dataset diversity in term of demographic axes represented and also in terms of the knowledge areas included. We demonstrate that the dataset contains more stereotypes unrecognized by current LLMs than previous datasets. HESEIA is available to support bias assessments grounded in educational communities.
- North America > Central America (0.40)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- South America > Bolivia (0.04)
- (10 more...)
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
- Education > Curriculum > Subject-Specific Education (0.67)
- Education > Educational Setting > K-12 Education > Secondary School (0.35)